Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VNEXT] feat: Opt-In Anlytics #277

Draft
wants to merge 14 commits into
base: vnext
Choose a base branch
from
Draft

Conversation

tankerkiller125
Copy link
Contributor

@tankerkiller125 tankerkiller125 commented Oct 10, 2024

What type of PR is this?

  • feature

What this PR does / why we need it:

This PR is adding some basic analytics to help us understand what kind of systems Homebox is running on so we can better optimize and understand it's use. At no point is any sensitive private information shared with us, and we are sending the data to our own self-hosted Plausible Analytics instance.

We do not store Region/City level location data, but we may store information regarding the country in which instances are sending data from (again, never anything identifiable).

This analytics data is currently only sent on application startup.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced analytics data collection, allowing users to opt-in for non-identifiable analytics to optimize Homebox.
    • Added a new environment variable, HBOX_ALLOW_ANALYTICS, to control analytics settings.
  • Documentation Updates

    • Enhanced documentation to include new analytics features and privacy policies, clarifying data collection practices.
    • Added new sections detailing current analytics collected and privacy commitments, including a live data embed.
    • Updated installation documentation to include HBOX_ALLOW_ANALYTICS in Docker configurations.
  • Chores

    • Updated development setup to include tools for testing and linting, improving the development environment.

Copy link
Contributor

coderabbitai bot commented Oct 10, 2024

Walkthrough

The pull request introduces modifications across several files to enhance analytics data collection within the application. Key updates include the addition of a new AllowAnalytics configuration option, the creation of a new data structure for analytics, and the implementation of an HTTP request to send collected data. Documentation has also been updated to reflect these changes, including new sections on analytics and a privacy policy. Additionally, the go.mod file has been updated to include necessary dependencies for the new functionality.

Changes

File Change Summary
backend/app/api/main.go - Added imports for encoding/json and github.com/shirou/gopsutil/v4/host.
- Defined analyticsData struct.
- Implemented conditional analytics reporting based on cfg.Options.AllowAnalytics.
- Constructed HTTP POST request for analytics data.
- Enhanced error handling for JSON marshaling and HTTP requests.
backend/go.mod - Added direct requirement: github.com/shirou/gopsutil/v4 v4.24.9.
- Added several indirect dependencies.
backend/internal/sys/config/conf.go - Added AllowAnalytics boolean field in Options struct.
docs/en/configure-homebox.md - Updated to include HBOX_ALLOW_ANALYTICS environment variable with default value false.
docs/en/installation.md - Included HBOX_ALLOW_ANALYTICS in Docker run command and Docker Compose configuration.
docs/en/analytics/index.md - Added new sections for analytics data collection and transparency.
- Included iframe for analytics statistics.
docs/en/analytics/privacy.md - Introduced a privacy policy detailing data collection and user rights.
docs/.vitepress/menus/en.mts - Added new "Analytics" section with links to "Current Analytics" and "Privacy Policy".
Taskfile.yml - Updated setup task to install gotestsum and golangci-lint for improved development tools.

Possibly related PRs

  • Update vnext #314: The changes in this PR involve updates to the documentation structure, including the API section, which may relate to the analytics data collection enhancements in the main PR by providing clearer access to API documentation that could be used for analytics reporting.

Suggested labels

📖 documentation


Security Recommendations

  • Ensure that any data sent in the analytics HTTP POST request is anonymized and does not include sensitive user information.
  • Regularly review the privacy policy to ensure compliance with data protection regulations.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bcedca2 and 5fd241b.

📒 Files selected for processing (1)
  • docs/en/analytics/privacy.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/en/analytics/privacy.md

[uncategorized] ~22-~22: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...a collected through Homebox's analytics are managed and stored in our self-hosted P...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)


[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... their data contributes to Homebox. * Right to withdraw consent: Users may opt ou...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[uncategorized] ~49-~49: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ... providing the web services to operate. Currently this includes only Cloudflare, who hand...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[grammar] ~49-~49: The word “cybersecurity” is spelled as one.
Context: ...s includes only Cloudflare, who handles cyber security services to our analytics endpoints and...

(CYBER_COMPOUNDS)


[style] ~53-~53: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...or Reddit) at a minimum 7 full days prior to any change being conducted (unless mand...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

🪛 Markdownlint (0.35.0)
docs/en/analytics/privacy.md

58-58: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


49-49: null
Bare URL used

(MD034, no-bare-urls)


62-62: null
Bare URL used

(MD034, no-bare-urls)


64-64: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (2)
docs/en/analytics/privacy.md (2)

33-46: ⚠️ Potential issue

Define specific data retention periods and enhance compliance documentation.

The current indefinite retention policy needs refinement:

  1. Establish clear retention periods
  2. Define criteria for determining when data is no longer "useful"
  3. Document automated data purging mechanisms

Additionally, explicitly address compliance with:

  • GDPR (for EU users)
  • CCPA (for California users)
  • Other relevant privacy regulations
🧰 Tools
🪛 LanguageTool

[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... their data contributes to Homebox. * Right to withdraw consent: Users may opt ou...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


9-18: Specify the complete scope of data collection.

The current list of collected data points should be exhaustive to ensure transparency. Please clarify:

  1. What specific "default, anonymized data" is collected through Plausible
  2. Whether any additional data points might be added in the future
  3. How users will be notified of changes to data collection scope

Consider implementing a versioned schema for analytics data to track changes in collection scope over time.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

cloudflare-workers-and-pages bot commented Oct 10, 2024

Deploying homebox-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: fb366a8
Status: ✅  Deploy successful!
Preview URL: https://62942272.homebox-docs.pages.dev
Branch Preview URL: https://mk-add-analytics.homebox-docs.pages.dev

View logs

backend/app/api/main.go Outdated Show resolved Hide resolved
backend/app/api/main.go Outdated Show resolved Hide resolved
backend/app/api/main.go Show resolved Hide resolved
@tankerkiller125
Copy link
Contributor Author

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Oct 10, 2024

✅ Actions performed

Full review triggered.

@tonyaellie
Copy link
Collaborator

Are the analytics going to be publicly viewable?

@tankerkiller125
Copy link
Contributor Author

I think we could make the analytics public, we may have to take a deeper look at it though to see what kind of impact (if any) that might have on server load. I'm doubtful it would increase the load in any significant way, so a public analytics dashboard would be likely feasible. And I think makes sense.

@tankerkiller125
Copy link
Contributor Author

tankerkiller125 commented Oct 11, 2024

The public analytics view has been added to the docs in the most recent commits. Still need to fix the PR CI stuff, and then this will probably be good to go. (Once we get the privacy policy sorted)

@tonyaellie
Copy link
Collaborator

Probably worth tracking if running in docker and version of homebox.

@zodac
Copy link
Contributor

zodac commented Oct 13, 2024

Appreciate this being opt-in! Looks like pretty basic data so I'll be happy to enable, but glad to have the option. 👍

@tankerkiller125 tankerkiller125 changed the title feat: Opt-In Anlytics [VNEXT] feat: Opt-In Anlytics Oct 19, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
docs/en/analytics/privacy.md (1)

54-60: Fix document formatting issues.

Several formatting issues need attention:

Apply these changes:

-### Contact Us
+## Contact Us

-* Discord: https://discord.homebox.software/ 
-* Reddit Modmail: r/Homebox
-* Github: https://git.homebox.software/ 
+* Discord: [Join our Discord](https://discord.homebox.software/)
+* Reddit Modmail: [r/Homebox](https://reddit.com/r/Homebox)
+* Github: [Homebox Repository](https://git.homebox.software/)
🧰 Tools
🪛 Markdownlint

54-54: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


58-58: null
Bare URL used

(MD034, no-bare-urls)


60-60: null
Bare URL used

(MD034, no-bare-urls)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between fb8647c and a8b4a0b.

📒 Files selected for processing (1)
  • docs/en/analytics/privacy.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/en/analytics/privacy.md

[style] ~45-~45: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... their data contributes to Homebox. * Right to withdraw consent: Users may opt ou...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~49-~49: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...or Reddit) at a minimum 7 full days prior to any change being conducted (unless mand...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

🪛 Markdownlint
docs/en/analytics/privacy.md

54-54: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


58-58: null
Bare URL used

(MD034, no-bare-urls)


60-60: null
Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)
docs/en/analytics/privacy.md (1)

9-18: ⚠️ Potential issue

Consider adding data transmission security details.

While the data collection points are well-defined, the policy should explicitly state how data is encrypted during transmission to ensure secure data transfer to the Plausible Analytics instance.

docs/en/analytics/privacy.md Show resolved Hide resolved
docs/en/analytics/privacy.md Show resolved Hide resolved
docs/en/analytics/privacy.md Show resolved Hide resolved
@tankerkiller125 tankerkiller125 changed the base branch from main to vnext October 29, 2024 20:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🛑 Comments failed to post (1)
backend/app/api/main.go (1)

115-137: 💡 Codebase verification

⚠️ Potential issue

🔒 Security improvements needed in analytics HTTP request handling

The current implementation has several security concerns that should be addressed:

  1. Response body handling is not following best practices:
-err = res.Body.Close()
-if err != nil {
-    log.Error().Err(err).Msg("failed to send analytics request")
-}
+if res != nil {
+    defer res.Body.Close()
+    // Read and discard body to prevent resource leaks
+    _, _ = io.Copy(io.Discard, res.Body)
+    if res.StatusCode >= 400 {
+        log.Error().Int("status", res.StatusCode).Msg("analytics request failed")
+    }
+}
  1. HTTP client should be reused instead of creating a new one for each request:
+var analyticsClient = &http.Client{
+    Timeout: 10 * time.Second,
+}
+
 func sendAnalytics() {
-    client := &http.Client{
-        Timeout: 10 * time.Second,
-    }
-    res, err := client.Do(req)
+    res, err := analyticsClient.Do(req)
  1. Error handling should be more robust and avoid potential nil pointer dereference:
 if err != nil {
-    log.Error().Err(err).Msg("failed to send analytics request")
+    log.Error().Msg("analytics request failed: network or timeout error")
+    return
 }

These changes will improve security, resource management, and error handling robustness.

🔗 Analysis chain

Enhance security measures in analytics HTTP request handling.

While the implementation uses HTTPS and includes basic error handling, several security improvements are recommended:

  1. Add certificate verification settings:
 client := &http.Client{
     Timeout: 10 * time.Second,
+    Transport: &http.Transport{
+        TLSClientConfig: &tls.Config{
+            MinVersion: tls.VersionTLS12,
+        },
+    },
 }
  1. Improve error handling to prevent information leakage:
-if err != nil {
-    log.Error().Err(err).Msg("failed to send analytics request")
+if err != nil {
+    log.Error().Msg("analytics request failed")
 }
  1. Add proper response body handling:
-err = res.Body.Close()
+if res != nil {
+    defer res.Body.Close()
+    if res.StatusCode != http.StatusOK {
+        log.Error().Msg("unexpected status from analytics endpoint")
+    }
+}

Also consider:

  • Implementing retry mechanism with exponential backoff
  • Adding request context with timeout
  • Implementing rate limiting

Let me analyze the HTTP client implementation more thoroughly to provide accurate security recommendations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for proper TLS configuration in the codebase
rg -A 5 "tls.Config" 

Length of output: 230


Script:

#!/bin/bash
# Search for HTTP client configurations and TLS settings
rg -A 10 "http.Client" 

# Search for any existing TLS imports
rg "crypto/tls"

# Look for any transport configurations
rg "http.Transport"

# Check for response body handling patterns
rg -A 5 "Body.Close"

Length of output: 888

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 2

🛑 Comments failed to post (2)
docs/en/analytics/privacy.md (2)

20-32: ⚠️ Potential issue

Enhance security measures documentation.

While self-hosting Plausible provides control over data, the policy should address:

  1. Access controls and authentication for the Plausible instance
  2. Data encryption at rest and in transit
  3. Regular security audits and monitoring
  4. Incident response procedures

Consider adding a section detailing these security aspects to provide transparency about data protection measures.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~22-~22: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...a collected through Homebox's analytics are managed and stored in our self-hosted P...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)


47-64: 🛠️ Refactor suggestion

Strengthen third-party security requirements and change management.

  1. Third-party security:

    • Document security requirements for third-party services
    • Specify data processing agreements
    • Regular security assessment procedures
  2. Change management:

    • Consider adding in-app notifications for policy changes
    • Implement version control for privacy policy
    • Document change approval process

Fix the bare URLs and heading structure:

-* Discord: https://discord.homebox.software/ 
-* Reddit Modmail: r/Homebox
-* Github: https://git.homebox.software/ 
+* [Discord](https://discord.homebox.software/)
+* [Reddit Modmail](https://www.reddit.com/r/Homebox)
+* [Github](https://git.homebox.software/)

-### Contact Us
+## Contact Us
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

# 6. 3rd Parties

Homebox may use 3rd parties as part of providing the web services to operate. Currently this includes only Cloudflare, who handles cyber security services to our analytics endpoints and websites. You can view their privacy policy at https://www.cloudflare.com/privacypolicy/

# 7. Policy Changes

Any changes to this privacy policy will be communicated to users through Homebox's update channels (namely Discord or Reddit) **at a minimum** 7 full days prior to any change being conducted (unless mandated by law to do so otherwise).

Continued use of Homebox following updates will imply acceptance of the revised policy, and users are free to opt-out of analytics at any point without impact to their usage of Homebox software. 


## Contact Us

For any questions about this privacy policy or your data, please contact the team through our official channels:

* [Discord](https://discord.homebox.software/)
* [Reddit Modmail](https://www.reddit.com/r/Homebox)
* [Github](https://git.homebox.software/)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~49-~49: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ... providing the web services to operate. Currently this includes only Cloudflare, who hand...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[grammar] ~49-~49: The word “cybersecurity” is spelled as one.
Context: ...s includes only Cloudflare, who handles cyber security services to our analytics endpoints and...

(CYBER_COMPOUNDS)


[style] ~53-~53: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...or Reddit) at a minimum 7 full days prior to any change being conducted (unless mand...

(EN_WORDINESS_PREMIUM_PRIOR_TO)

🪛 Markdownlint (0.35.0)

58-58: Expected: h2; Actual: h3
Heading levels should only increment by one level at a time

(MD001, heading-increment)


49-49: null
Bare URL used

(MD034, no-bare-urls)


62-62: null
Bare URL used

(MD034, no-bare-urls)


64-64: null
Bare URL used

(MD034, no-bare-urls)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants